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Abstract 

The proximity operator of a convex function is a natural extension of the notion of a 
projection operator onto a convex set. This tool, which plays a central role in the analysis and 
the numerical solution of convex optimization problems, has recently been introduced in the 
arena of inverse problems and, especially, in signal processing, where it has become increasingly 
important. In this paper, we review the basic properties of proximity operators which are 
relevant to signal processing and present optimization methods based on these operators. These 
proximal splitting methods are shown to capture and extend several well-known algorithms in 
a unifying framework. Applications of proximal methods in signal recovery and synthesis are 
discussed. 
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1 Introduction 

Early signal processing methods were essentially linear, as they were based on classical func- 
tional analysis and linear algebra. With the development of nonlinear analysis in mathematics 
in the late 1950s and early 1960s (see the bibliographies of [51 1142] ) and the availability of 
faster computers, nonlinear techniques have slowly become prevalent. In particular, convex 
optimization has been shown to provide efficient algorithms for computing reliable solutions in 
a broadening spectrum of applications. 

Many signal processing problems can in fine be formulated as convex optimization problems 
of the form 

minimize /i (x) H h /,„ (a;) , ( 1 ) 

where /i, . . . , are convex functions from to ]— oo, +cx)]. A major difficulty that arises 
in solving this problem stems from the fact that, typically, some of the functions are not 
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differentiable, which rules out conventional smooth optimization techniques. In this paper, 
we describe a class of efficient convex optimization algorithms to solve ([T]). These methods 
proceed by splitting in that the functions /i , . . . , fm are used individually so as to yield an 
easily implementable algorithm. They are called proximal because each nonsmooth function in 
([T]) is involved via its proximity operator. Although proximal methods, which can be traced 
back to the work of Martinet |98) . have been introduced in signal processing only recently 
[iSl [55] , their use is spreading rapidly. 

Our main objective is to familiarize the reader with proximity operators, their main proper- 
ties, and a variety of proximal algorithms for solving signal and image processing problems. The 
power and flexibility of proximal methods will be emphasized. In particular, it will be shown 
that a number of apparently unrelated, well-known algorithms (e.g., iterative thresholding, pro- 
jected Landweber, projected gradient, alternating projections, alternating-direction method of 
multipliers, alternating split Bregman) are special instances of proximal algorithms. In this 
respect, the proximal formalism provides a unifying framework for analyzing and developing a 
broad class of convex optimization algorithms. Although many of the subsequent results are 
extendible to infinite-dimensional spaces, we restrict ourselves to a finite-dimensional setting 
to avoid technical digressions. 

The paper is organized as follows. Proximity operators are introduced in Section [2l where 
we also discuss their main properties and provide examples. In Sections|3]and|4l we describe the 
main proximal splitting algorithms, namely the forward-backward algorithm and the Douglas- 
Rachford algorithm. In Section [51 we present a proximal extension of Dykstra's projection 
method which is tailored to problems featuring strongly convex objectives. Composite problems 
involving linear transformations of the variables are addressed in Section [Sj The algorithms 
discussed so far are designed for to = 2 functions. In Section [3 we discuss parallel variants 
of these algorithms for problems involving to > 2 functions. Concluding remarks are given in 
Section [1 

Notation. We denote by the usual TV-dimensional Euclidean space, by || -H its norm, and 
by I the identity matrix. Standard definitions and notation from convex analysis will be used 
[f3l|83[Tl4]. The domain of a function / : ]-oo,-hoo] isdom/ = {x £ < -foo}. 

ro(M^) is the class of lower semicontinuous convex functions from to ]— oo, -l-oo] such that 
dom/ ^ 0. Let / e ro(M^). The conjugate of / is the function /* e ro(M^) defined by 

/* : ^ ]-oo, +oo] -.u^ sup x'^u - /(x), (2) 



and the subdifferential of / is the set-valued operator 

df:R^^2^":x^{ueR^\iyyeR^) (y - xf u + f{x) < f{y)} . (3) 
Let C be a nonempty subset of R^. The indicator function of C is 



0, iixeC; 
^c-x^\ (4) 
-foo, it X f C, 



the support function of C is 



<Tc — t-c '■ ^ ^ ]^oo, +oo] : u i-> supu x, (5) 

the distance from x € R^ to C is dc{x) = inf^gcll^^ — vW, and the relative interior of C 
(i.e., interior of C relative to its afiine hull) is the nonempty set denoted by riC. If C is 
closed and convex, the projection of a; € M''^ onto C is the unique point Pcx £ C such that 
dc{x) = \\x - Pcx\. 



2 From projection to proximity operators 

One of the first widely used convex optimization splitting algorithms in signal processing 
is POCS (Projection Onto Convex Sets) [HU [42l I141j . This algorithm is employed to re- 
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cover /synthesize a signal satisfying simultaneously several convex constraints. Such a problem 
can be formalized within the framework of Q by letting each function be the indicator 
function of a nonempty closed convex set Ci modeling a constraint. This reduces ([1]) to the 
classical convex feasibiUty problem [3ll |42l HH |86l [93l [HH [Ml M [Ml] 

m 

find X e Pi C,. (6) 

1=1 

The POCS algorithm |25[ 1141] activates each set Ci individually by means of its projection 
operator Pp. . It is governed by the updating rule 

Xn+l ^ PCi ■ ■ ■ Pc^Xn- (7) 

When PlilLi Ci ^ the sequence (a;„)„gN thus produced converges to a solution to ^ pS] . 
Projection algorithms have been enriched with many extensions of this basic iteration to solve 
dni) irOl [321 HSl [SO] ■ Variants have also been proposed to solve more general problems, e.g., that 
of finding the projection of a signal onto an intersection of convex sets [121 HT] 1137] . Beyond 
such problems, however, projection methods are not appropriate and more general operators are 
required to tackle ([T]). Among the various generalizations of the notion of a convex projection 
operator that exist [lOl [Til [44l [90] , proximity operators are best suited for our purposes. 

The projection Pqx of x G M.^ onto the nonempty closed convex set C C is the solution 
to the problem 

minimize tc(2/) + -|la; - y|p. (8) 

Under the above hypotheses, the function oq' belongs to ro(M^). In 1962, Moreau [101] pro- 
posed the following extension of the notion of a projection operator, whereby the function lc 
in ^ is replaced by an arbitrary function / S ro(IR^). 

Definition 2.1 (Proximity operator) Let f E ro(M^). For every x £ M^, the minimiza- 
tion problem 

minimize f{y) + -||x - yf (9) 

admits a unique solution, which is denoted by prox^a;. The operator prox^: M.^ — > thus 
defined is the proximity operator of f. 

Let / e ro(M^). The proximity operator of / is characterized by the inclusion 

(V(a;,p) e X M^) p = prox^. x ^ x-pe df{p), (10) 

which reduces to 

(V(a;,p) G X M^) p = prox^. a; ^ x-p = Vf{p) (11) 

if / is differentiable. Proximity operators have very attractive properties that make them 
particularly well suited for iterative minimization algorithms. For instance, proxy is firmly 
nonexpansive, i.e., 

(Va; e K^)(Vj/ G M^) Uprox^x - prox^y|p + |l(a; - prox^a;) - {y - prox^j/)|p 

<\\x-y\\\ (12) 

and its fixed point set is precisely the set of minimizers of /. Such properties allow us to envi- 
sion the possibility of developing algorithms based on the proximity operators (proxj. )i<i<,„ to 
solve ([1]), mimicking to some extent the way convex feasibility algorithms employ the projection 
operators {Pci)i<i<m to solve As shown in Table [T] proximity operators enjoy many addi- 
tional properties. One will find in Table [5] closed- form expressions of the proximity operators of 
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various functions in ro(M) (in the case of functions such as | • proximity operators imphcitly 
appear in several places, e.g., [3[ |4[ |35]). 

From a signal processing perspective, proximity operators have a very natural interpretation 
in terms of denoising. Let us consider the standard denoising problem of recovering a signal 
X £ from an observation 

y = X + w, (13) 

where w S models noise. This problem can be formulated as ([9]), where || • — y|p/2 plays the 
role of a data fidelity term and where / models a priori knowledge about x. Such a formulation 
derives in particular from a Bayesian approach to denoising [211 11241 1126j in the presence of 
Gaussian noise and of a prior with a log-concave density exp(— /). 



3 Forward-backward splitting 

In this section, we consider the case of m — 2 functions in ([T|) , one of which is smooth. 

Problem 3.1 Let /i e ro(M^), let /2 : ^ R be convex and differentiable with a /3- 
Lipschitz continuous gradient V/2, i.e., 

(V(x,2/)eM^xM^) |lV/2(x)-V/2(y)|l </3||x-y|l, (14) 

where /? € ]0, +oo[. Suppose that fi{x) + f2{x) +00 as ||x|| — > +00. The problem is to 

minimize /i (x) + /2 (x) . (15) 

It can be shown f55] that Problem 13.11 admits at least one solution and that, for any 7 £ 
]0, +00 [, its solutions are characterized by the fixed point equation 

x = wox^f^ {x ~ 7V/2(a;)) . (16) 

This equation suggests the possibility of iterating 

X„+i = prOX^^j^ { x„ - -jn^hi^n)) (17) 
backward stop forward step 

for values of the step-size parameter 7„ in a suitable bounded interval. This type of scheme is 
known as a forward-backward splitting algorithm for, using the terminology used in discretiza- 
tion schemes in numerical analysis [132j , it can be broken up into a forward (explicit) gradient 
step using the function /2, and a backward (implicit) step using the function /i. The forward- 
backward algorithm finds its roots in the projected gradient method [94J and in decomposition 
methods for solving variational inequalities [99j I119j . More recent forms of the algorithm and 
refinements can be found in [531 HOI SHI IHS 1130) . Let us note that, on the one hand, when 
/i = 0, (|17p reduces to the gradient method 

Xn+i = a;„ - 7„V/2(a;„) (18) 

for minimizing a function with a Lipschitz continuous gradient [19[ 161] . On the other hand, 
when /2 = 0, ()17p reduces to the proximal point algorithm 

Xn+l = prox^^^^Xn (19) 

for minimizing a nondifferentiable function [5S1 HSl HH IMl 1115] . The forward-backward algo- 
rithm can therefore be considered as a combination of these two basic schemes. The following 
version incorporates relaxation parameters (A„)„gN- 
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Algorithm 3.2 (Forward- backward algorithm) 

Fix £ e]0,niin{l,l//3}[, xq £ 
For n = 0, 1, . . . 

7„ e [e, 2//3 - £] 
yn = - 7nV/2(a;„) 

Xn+1 = Xn + A„(prOX^^y^y„ - Xn). 

Proposition 3.3 35' Every sequence (a;„)„gN generated by Alaorithm \3.2\ converges to a so- 
lution to Problem \3.1l 

The above forward-backward algorithm features varying step-sizes (7n)nGN but its relaxation 
parameters (A„)ngN cannot exceed 1. The following variant uses constant step-sizes and larger 
relaxation parameters. 

Algorithm 3.4 (Constant-step forward-backward algorithm) 

Fix£e]0,3/4[and xq S 
For n = 0,1,... 

Vn^Xn- ^"^V/2(a;„) 

A„e[e,3/2-e] (21) 

Xn+l = Xn + A„(pr0X^-i^^J/„ - Xn). 

Proposition 3.5 IZ Every sequence {xn)neN generated by Algorithm ] 3. 4\ converges to a so- 
lution to Problem\3A\ 



Although they may have limited impact on actual numerical performance, it may be of 
interest to know whether linear convergence rates are available for the forward-backward al- 
gorithm. In general, the answer is negative: even in the simple setting of Example 13.111 
below, linear convergence of the iterates {xn)nm generated by Algorithm 13.21 fails [H il39j . 
Nonetheless it can be achieved at the expense of additional assumptions on the problem 
[M[Mll40ll6TllMllMl [TTM[TTm[T]^[T44] . 

Another type of convergence rate is that pertaining to the objective values (/i(a;„) -f 
f2{xn))nm- This rate has been investigated in several places [111 HH and variants of 
Algorithm O have been developed to improve it [H [IB [51 UHl EHSl [Ell [SS] in the spirit of 
classical work by Nesterov |106) . It is important to note that the convergence of the sequence 
of iterates {xn)n&i, which is often crucial in practice, is no longer guaranteed in general in such 
variants. The proximal gradient method proposed in [16l[T5] assumes the following form. 



Algorithm 3.6 (Beck-Teboulle proximal gradient algorithm) 

Fix xq G M^, set zq ~ xq and t^ = 1 
For 71 = 0, 1, . . . 

Vn^ Zn- /3"^V/2(2;„) 

Xn+l = prox^-iy^y„ 



1 + ^Ul + 1 

tn+l — l^^j 



A„ = 1 



2 

t„ - 1 



Zn-\-l — Xn -t" ^n{Xn+l -^n)- 

While little is known about the actual convergence of sequences produced by Algorithm [ 
the 0(l/n^) rate of convergence of the objective function they achieve is optimal |103j . although 
the practical impact of such property is not always manifest in concrete problems (see Figure [2] 
for a comparison with the Forward-Backward algorithm). 
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Figure 1: Proximity operator of the function 



-oo, +00] : ^ I— < 



- ln(^ - w) + ln(-a;) 
-ln(a; -^) + In(aJ) 
-00 



if eek,o] 

if CG]OM 
otherwise. 



The proximity operator thresholds over the interval [l/w, l/w], and saturates at — c« and +00 with 
asymptotes at lo and uJ, respectively (see Table I2lxiiil and [53j ) . 



Proposition 3.7 16 Assume that, for every y G domfi, dfi{y) ^ 0, and let x be a solution 
to Problem \3.1\ Then every sequence {xn)n&i generated by Alaorithm \3.6\ satisfies 

(VneN\{0}) f,{x,,) + Mxn)<h{x) + h{x) + ^^f^-^. (23) 

(n + ly 

Other variations of the forward-backward algorithni have also been reported to yield im- 
proved convergence profiles O [70l Ell [T34 1 IT35 ] . 

Problem 13.11 and Proposition 13.31 cover a wide variety of signal processing problems and 
solution methods [55]. For the sake of illustration, let us provide a few examples. For notational 
convenience, we set A„ = 1 in Algorithm 13. 2[ which reduces the updating rule to p7)) . 

Example 3.8 (projected gradient) In Problem 13.11 suppose that /i — lc, where C is a 
closed convex subset of M.^ such that {x G C \ f2{x) < 77} is nonempty and bounded for some 
77 G M. Then we obtain the constrained minimization problem 

minimize J2{x). (24) 
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Since prox^ — Pc (see Table lllxiil) , the forward-backward iteration reduces to the projected 
gradient method 

Xn+l^ Pcixn-lnVhiXn)). £ < 7n < 2//3 - £. (25) 

This algorithm has been used in numerous signal processing problems, in particular in total 
variation denoising [34] , in image deblurring [18] , in pulse shape design [50] , and in compressed 
sensing [73]. 

Example 3.9 (projected Landweber) In Example 13.81 setting f2 - x !—> \\Lx — y\\'^/2, where 
i e K^^^^ \ {0} and y e M^^, yields the constrained least -squares problem 

minimize — — ?/||^. (26) 

Since V/2 : x i-> {Lx — y) has Lipschitz constant (3 — H^IP, (l?5|) yields the projected Landwe- 
ber method |i68) 

Xn+l^Pc{xn+lnL^{y-LXn)), £ < 7« < 2/ 11 if " £. (27) 

This method has been used in particular in computer vision |89| and in signal restoration [129] . 

Example 3.10 (backward- backward algorithm) Let / and g be functions in ro(M^). 
Consider the problem 

minimize /(x) -I- g(x), (28) 

where g is the Moreau envelope of g (see Table [Tlvii)) . and suppose that f{x) +g[x) — > +oo as 
— > +00. This is a special case of Problem 13.11 with fi — f and /2 = g. Since V/2: x 1— 
X — proXgX has Lipschitz constant (3 — 1 55, 102 , Proposition 13 . 31 with 7„ = 1 asserts that the 
sequence {xn)neN generated by the backward-backward algorithm 

Xn+i = proxy (proXgX„) (29) 

converges to a solution to (pSj) . Detailed analyses of this scheme can be found in [TJ [TU US] 1108] . 

Example 3.11 (alternating projections) In Example l3.10[ let / and g be respectively the 
indicator functions of nonempty closed convex sets C and D, one of which is bounded. Then 
([28l) amounts to finding a signal x in C at closest distance from D, i.e., 

minimize — d^(x). (30) 

xgc 2 

Moreover, since proxy = Pq and prox^ — Po, (|29p yields the alternating projection method 

Xn+l ^ PciPoXn), (31) 

which was first analyzed in this context in |41| . Signal processing applications can be found in 
the areas of spectral estimation [80 , pulse shape design [107) . wavelet construction .109] . and 
signal synthesis [140] . 



Example 3.12 (iterative thresholding) Let (6fe)i<fe<jv be an orthonormal basis of R^, let 
{^k)i<k<N be strictly positive real numbers, let L e R*^^^ \ {0}, and let y G M*^. Consider 
the i^~£'^ problem 

minimize ^ w^lx^fefcl + - yf . (32) 
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Figure 2: Forward-backward versus Beck-Teboulle : As in Example [XTTl let C and D be two closed 
convex sets and consider the problem (j30p of finding a point Xqo in C at minimum distance from 
D. Let us set /i = lq and /2 = d'jj/2. Top: The forward-backward algorithm with 7„ = 1.9 and 
A„ = 1. As seen in Example 13. IH it reduces to the alternating projection method (jSip . Bottom: 
The Beck-Teboulle algorithm. 
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This type of formulation arises in signal recovery problems in which y is the observed signal 
and the original signal is known to have a sparse representation in the basis {bk)i<k<N , e.g., 
[m [201 [Ml ES [73 [23 [1251 [127]. We observe that ^ is & special case of ^ with 



f^:x^\\Lx-yf/2. 



(33) 



Since prox^y^ : x ^ ^i<k<N ^'^^H-i'^k,i'^k]i^^ ^k) bk (see Table HTviiil and Table l^ln)) . it fol- 
lows from Proposition 13.31 that the sequence (a;„)„gN generated by the iterative thresholding 
jrithm 



^k,n = SOft[_^^^^_^„^^] {Xn + InL^iy " LXn))^ bk ^^^^ 



JV 

Xn+1 = '^£.k,nbk, where 
fc=i 



converges to a solution to (15^ . 



£<7n<2/||Lf -e, 



Additional applications of the forward-backward algorithm in signal and image processing 
can be found in ^33 [H [H [31 [Ml [SZl [53 [13 [S71 [71] . 

4 Douglas-Rachford splitting 

The forward-backward algorithm of Section[3requires that one of the functions be differentiable, 
with a Lipschitz continuous gradient. In this section, we relax this assumption. 

Problem 4.1 Let /i and /2 be functions in ro(IR.^) such that 

(ridom/i) n (ridom/2) 7^ (35) 
and fi{x) + 72(2;) — > -l-oo as ||a;|| +00. The problem is to 

minimize fi(x] + f2(x]. (36) 

What is nowadays referred to as the Douglas-Rachford algorithm goes back to a method 
originally proposed in [60] for solving matrix equations of the form u = Ax + Bx, where A and 
B are positive-definite matrices (see also |132| ). The method was transformed in [9F to handle 
nonlinear problems and further improved in |96| to address monotone inclusion problems. For 
further developments, see [13 [13 [SI]- 

Problem 14. II admits at least one solution and, for any 7 G ]0, +oo[, its solutions are charac- 
terized by the two-level condition [55] 

U = pro^^f^y ^g^^ 
[prox^^^y = prox^y^(2prox^^^y - y), 

which motivates the following scheme. 

Algorithm 4.2 (Douglas-Rachford algorithm) 

Fix £ e]0, 1[, 7 > 0, yo e 
For n = 0, 1, 



_L , . . . 

Xn = prox^y^y„ 



A„ e [e, 2 - e] (38) 
_ y„+i = y„ -I- A„ (prox^^^ (2xn - Vn) - Xn) ■ 
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Proposition 4.3 52^ Every sequence (a;„)„gN generated by Algorithm \4.2\ converges to a so- 
lution to Problem \4.1\ 

Just like the forward-backward algorithm, the Douglas-Rachford algorithm operates by 
splitting since it employs the functions /i and /2 separately. It can be viewed as more general 
in scope than the forward-backward algorithm in that it does not require that any of the 
functions have a Lipschitz continuous gradient. However, this observation must be weighed 
against the fact that it may be more demanding numerically as it requires the implementation 
of two proximal steps at each iteration, whereas only one is needed in the forward-backward 
algorithm. In some problems, both may be easily implementable (see Fig. [3] for an example) 
and it is not clear a priori which algorithm may be more efhcient. 

Applications of the Douglas-Rachford algorithm to signal and image processing can be found 
in [Ml 1521 1621 [631 ITT71ITT81IT23] . 

The limiting case of the Douglas-Rachford algorithm in which A„ = 2 is the Peaceman- 
Rachford algorithm |48l |66l [Ml • Its convergence requires additional assumptions (for instance, 
that /2 be strictly convex and real- valued) [49] . 

5 Dykstra-like splitting 

In this section we consider problems involving a quadratic term penalizing the deviation from 
a reference signal r. 

Problem 5.1 Let / and g be functions in ro(M''^) such that dom/ H dom g ^ 0, and let 
r e M^. The problem is to 

minimize f(x) + gix) H — ||a; — rip. (39) 

£CGB« ' 2 

It follows at once from ([9]) that Problem IS . ll admits a unique solution, namely x — proxy^^ r. 
Unfortunately, the proximity operator of the sum of two functions is usually intractable. To 
compute it iteratively, we can observe that (p9| can be viewed as an instance of (pB)) in Prob- 
lem l4.1l with /i = / and /2 — g+\\- — r|p/2. However, in this Douglas-Rachford framework, the 
additional qualification condition pSI) needs to be imposed. In the present setting we require 
only the minimal feasibility condition dom / H Aom.g ^ 0. 

Algorithm 5.2 (Dykstra-like proximal algorithm) 

Set a;o = r, po = 0, go = 
For n = 0, 1, . . . 

yn = proXg(a:„ + Pn) 

Pn+l Xfi + Pii 2/ri (40) 
Xn+1 = pm^fiUn + qn) 
. Qn+l = Vn + qn - Xn+1- 

Proposition 5.3 12 Every sequence (x„)„eN generated by Algorithm \5.2\ converges to the 
solution to Problem IS. 1\ 

Example 5.4 (best approximation) Let / and g be the indicator functions of closed convex 
sets C and D, respectively, in Problem IS. II Then the problem is to find the best approximation 
to r from C D D, i.e., the projection of r onto C O D. In this case, since proxj = Pq and 
proXg — Po, the above algorithm reduces to Dykstra's projection method [22] [64]. 

Example 5.5 (denoising) Consider the problem of recovering a signal x from a noisy obser- 
vation r = X + w, where w models noise. If / and g arc functions in ro(M^) promoting certain 
properties of x, adopting a least-squares data fitting objective leads to the variational denoising 
problem 
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Figure 3: Forward-backward versus Douglas- Rachford: As in Example 13. IH let C and D be two 
closed convex sets and consider the problem (|30p of finding a point Xqo in C at minimum distance 
from D. Let us set /i = lc and /2 = d'j-)/2. Top: The forward-backward algorithm with 7„ = 1 and 
An = 1. As seen in Example 13.111 it assumes the form of the alternating projection method (j31|) . 
Bottom: The Douglas-Rachford algorithm with 7 = 1 and A„ = 1. Table [Tlxiil vields proxj^ = Pq 
and Table lllvil yields proxj^ : x 1— )• {x + Pdx)/2. Therefore the updating rule in Algorithm 14.21 
reduces to Xn = {Vn + Poyn)/'^ and = Pc{2xn - Vn) + Vn - Xn = PciPoyn) + yn- Xn- 
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6 Composite problems 



We focus on variational problems with m = 2 functions involving explicitly a linear transfor- 
mation. 

Problem 6.1 Let / e ro(M^), let g e ro(R*0. and let L e R^'^xn ^ {q} be such that 
donig n i(dom/) 7^ and f{x) + g{Lx) — > +00 as ||x|| — > +00. The problem is to 

minimize f(x) + g{Lx). (41) 

Our assumptions guarantee that Problem 16. II possesses at least one solution. To find such 
a solution, several scenarios can be contemplated. 



6.1 Forward-backward splitting 

Suppose that in Problem 16.11 g is differentiable with a r-Lipschitz continuous gradient (see 
(IT4l) ). Now set /i = / and /2 = go L. Then /2 is differentiable and its gradient 



V/2 o V.g o L 



(42) 



is /3-Lipschitz continuous, with (3 = T||i||^. Hence, we can apply the forward-backward splitting 
method, as implemented in Algorithm 13.21 As seen in ((20)) . it operates with the updating rule 



(43) 



7„e [£,2/(r||L|p)-e] 

Vn = Xn - JnL^^g{LXn) 

K e [e, 1] 

Xn+i = Xn + A„(prox^^yy„ - Xr, 



Convergence is guaranteed by Proposition [3]3] 



6.2 Douglas- Rachford splitting 

Suppose that in Problem 16.11 the matrix L satisfies 

LL^ = vI, where ve]Q,+oo[ (44) 

and (ridom(;)nriL(dom/) 7^ 0. Let us set /i = / and /2 = g°L. As seen in Table fllxl proxy^ 
has a closed- form expression in terms of prox^ and we can therefore apply the Douglas- Rachford 
splitting method (Algorithm 14. 2p . In this scenario, the updating rule reads 

K e [e, 2 - e] (45) 
. J/n+i = Vn + A„(prox^^(2x„ - y„) - a;„). 

Convergence is guaranteed by Proposition 231 



6.3 Dual forward-backward splitting 

Suppose that in Problem iH / = /i+||--r||V2, where h e ro(M^) and r e M^. Then (gl]) 
becomes ^ 

minimize h(x) A- qiLx) -\ — llx — rlP, (46) 
xeR" ' 2 

which models various signal recovery problems, e.g., [331 IMl ISIl IMl 11121 1138) . If (03]) holds, 
proXgo^ is decomposable, and (|46l) can be solved with the Dykstra-like method of Section O 
where fi = h + \\ ■ -~r\^ (see Table [TTivl) and f2 ^ g o L (see Table fllx)) . Otherwise, we 
can exploit the nice properties of the Fenchel-Moreau-Rockafellar dual of (|46|) . solve this dual 
problem by forward-backward splitting, and recover the unique solution to ()46|) 
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Algorithm 6.2 (Dual forward-backward algorithm) 

Fix e e ]0,min{l,l/||L|l2}[, G M*^ 
For n = 0, 1, . . . 

Xn = prox;j(r - L^u„) 

7„e [e,2/||Lf -e] ^^^^ 
A„ e [e, 1] 

Un+i = Mn + A„ (prox^^g. (u„ + 7„La;„) - m„) . 

Proposition 6.3 'SI; Assume that (ridom(7)nriL(doni/i) ^ 0. Then every sequence (x„)„gN 
generated by the dual forward-backward alaorithm \6.S\ converges to the solution to ()46p . 



6.4 Alternating-direction method of multipliers 

Augmented Lagrangian techniques are classical approaches for solving Problem l6.1l |77[ [78 ] (see 
also [75l[79]). First, observe that (j4T|) is equivalent to 



minimize f{x)+g{y)- (48) 

Lx=y 

The augmented Lagrangian of index 7 € ]0, +00 [ associated with (|48p is the saddle function 

C^:R^ X R*^ X M*^ ^ ]-oo, +cx)] 

(a;, z) ^ fix) + g{y) + -z^{Lx - y) + - (49) 

7 27 

The alternating-direction method of multipliers consists in minimizing over a;, then over 1/, 
and then applying a proximal maximization step with respect to the Lagrange multiplier z. 
Now suppose that 

L is invertible and (ridomg) n riL(dom/) 7^ 0. (50) 

By analogy with ([9]), if we denote by proxy' the operator which maps a point y € M*^ to the 
unique minimizer of x 1— > f{x) + ULa; — ?/|p/2, we obtain the following implementation. 

Algorithm 6.4 (Alternating-direction method of multipliers (ADMM)) 

Fix 7 > 0, yo e R*^ zo G R*^ 
For n = 0, 1, . . . 

Xn = prox:^^(?;„ - z„) 

" LXfi /r 1 ^ 

/ , ^ (51) 
y„+i = prox^g(s„ + z„) 

The convergence of the sequence (a;„)„gN thus produced under assumption ([50]) has been 
investigated in several places, e.g., [75l [77l [79]. It was first observed in [76] that the ADMM 
algorithm can be derived from an application of the Douglas-Rachford algorithm to the dual of 
(|4T|) . This analysis was pursued in [66|, where the convergence of (x„)„gN to a solution to (|4T|) 
is shown. Variants of the method relaxing the requirements on L in (I50|) have been proposed 

In image processing, ADMM was applied in [81 to an regularization problem under the 
name "alternating split Bregman algorithm." Further applications and connections are found 
in [2l[69l[n2l[l43]. 
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7 Problems with m > 2 functions 

We return to the general minimization problem ([T]). 
Problem 7.1 Let /i,. . . ,/,„ be functions in ro(M^) such that 

(ridom/i) n • • • n (ridom/„0 7^ (52) 
and /i(a;) + • • ■ + fm{x) +oo as ||a;|| +oo. The problem is to 

minimize + • • • + (53) 

Since the methods described so far are designed for m = 2 functions, we can attempt to 
reformulate (j53p as a 2-function problem in the m-fold product space 

?i = X • • • X (54) 

(such techniques were introduced in [1101 1111) and have been used in the context of convex 
feasibility problems in [TOl |43l |45] ) . To this end, observe that (|53| can be rewritten in H as 

minimize /i(a;i) H V fm{xm). (55) 

(a:i,...,a:„)eW 

X\—---—Xm 

If we denote hy x — (xi, . . . , Xm) a generic element in ([55)1 is equivalent to 

minimize ld{x) + f{x), (56) 

where 

(D^{ix,...,x)&n\x€R''} ^^^^ 
[f : X ^ .fi{xi) H h/,„(a;„0- 

We are thus back to a problem involving two functions in the larger space 7i. In some cases, 
this observation makes it possible to obtain convergent methods from the algorithms discussed 
in the preceding sections. For instance, the following parallel algorithm was derived from the 
Douglas-Rachford algorithm in [M] (see also [IH] for further analysis and connections with 
Spingarn's splitting method |120) ). 

Algorithm 7.2 (Parallel proximal algorithm (PPXA)) 

Fix e e ]0, 1[, 7 > 0, {uJi)i<i<m G ]0, 1]"' such that 
E"i^» = l, yi,o€M^,...,2/™,oeR^ 

Set a;o = Z)" i '^iVi.o 
For n = 0,1,... 
For i = 1 , . . . , m 

rn 

Pn = ^ (^iPi,n 
i=l 

e < A„ < 2 - e 
For « = 1 , . . . , m 

[ Vi,n+1 — Vi.n + ^n{'^Pn ~ Xn — Pi,ri) 
Xn+1 — Xji -|- Xn{Pn -^n)- 

Proposition 7.3 54 Every sequence {xn)neN generated by Algorithm \7.2\ converges to a so- 
lution to Problem \7. 1\ 
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Example 7.4 (image recovery) In many imaging problems, we record an observation y £ 
M.^ of an image z G K'^ degraded by a matrix L S K^^^ and corrupted by noise. In the 
spirit of a number of recent investigations (see [37] and the references therein) , a tight frame 
representation of the images under consideration can be used. This representation is defined 
through a synthesis matrix G E-^^^ (with K < N) such that F — vl, for some 
V € ]0, +00 [. Thus, the original image can be written as z = i^^x, where x G is a vector of 
frame coefficients to be estimated. For this purpose, we consider the problem 

minimize \\\LF^ x - y\\^ + <^{x) + iY{F^ x), (58) 
x&c 2 

where C is a closed convex set modeling a constraint on z, the quadratic term is the standard 
least-squares data fidelity term, $ is a real- valued convex function on (e.g., a weighted £^ 
norm) introducing a regularization on the frame coefficients, and tv is a discrete total variation 
function aiming at preserving piecewise smooth areas and sharp edges [116j . Using appropriate 
gradient filters in the computation of tv, it is possible to decompose it as a sum of convex 
functions (tvi)i<i<g, the proximity operators of which can be expressed in closed form |54| ,lll3j . 
Thus, ([55t appears as a special case of ([55)1 with m = q + 3, fi — lc , f2 ^ 11-^^^ • ~y|P/2, 
/a = and /a+i = tvi{F'^ ■) for i G {1, . . . , q}. Since a tight frame is employed, the proximity 
operators of /2 and (/3+i)i<i<g can be deduced from Table [Hxl Thus, the PPXA algorithm is 
well suited for solving this problem numerically. 

A product space strategy can also be adopted to address the following extension of Prob- 
lem [O] 

Problem 7.5 Let /i, ...,/,„ be functions in ro(M^) such that dom/i n • • • n dom/„j ^ 0, 
let {u!i)i<i<„i e ]0, 1]™ be such that X^IIli '^i ^ 1' ^'^"^ 1*^^ ^ ■ The problem is to 

minirnize ''^^Ldifi{x) + —\\x — r\\'^ . (59) 



Algorithm 7.6 (Parallel Dykstra-like proximal algorithm) 

Set xq = r, zi,o = xq, . . . , z^.o = 
For n — 0,1, . . . 

For i — 1, . . . ,m 

[ Pi^n = prOXy^Zi,„ 

Xn+l = I]™ 1 t^iPi,n (60) 

For i — 1, . . . ,m 



Proposition 7.7 49 Every sequence {xn)n£N generated by Algorithm |7.6] converges to the 
solution to Problem\7.5\ 



Next, we consider a composite problem. 



Problem 7.8 For every i e {1, . . . ,m}, let e ro(M^^') and let L, e R*^'^^. Assume that 

(3(7 eR^) iiq e ri domgi, . . . , € ri dom gm, (61) 

that gi{Lix) + ■ ■ ■ + g„i{Lmx) — > +00 as ||a;|| — > +00, and that Q = J2i<i<m -^J -^i invertible. 
The problem is to 

minimize gi(Lix) -!-•••+ gm(Lmx). (62) 
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Proceeding as in ([55]) and ([5^ . can be recast as 

minimize LD{x)+g{y), (63) 

y=Lx 

where 

?i = X • • • X R^, g = R*^i X • • • X R*^" 

L:-H^Q:x^{LiXi,...,Lr„Xrn) (64) 
g: ^ ^ ]-oo,+oo] : y h-> ^1(2/1) H h5m(ym)- 

In turn, a solution to can be obtained as the hmit of the sequence (a:„)„gN constructed 
by the following algorithm, which can be derived from the alternating-direction method of 
multipliers of Section [HH] (alternative parallel offsprings of ADMM exist, see for instance [BS]). 

Algorithm 7.9 (Simultaneous-direction method of multipliers (SDMM)) 

Fix 7 > 0, 2/1,0 e K^^i , . . . , 2/™,o e R^^" , zi,o e R^^i , . . . , z™,o e R*^'" 

For n = 0, 1, . . . 
For i = 1 , . . . , TO 

/ (65) 
Vi.7i+i = prox^g.(si,„ + Zi,„) 

This algorithm was derived from a slightly different viewpoint in 118 with a connection with 
the work of [71]. In these papers, SDMM is applied to deblurring in the presence of Poisson 
noise. The computation of Xn in ()65p requires the solution of a positive-definite symmetric 
system of linear equations. Efficient methods for solving such systems can be found in [82]. In 
certain situations, fast Fourier diagonalization is also an option [2l ITT]. 

In the above algorithms, the proximal vectors, as well as the auxiliary vectors, can be 
computed simultaneously at each iteration. This parallel structure is useful when the algorithms 
are implemented on multicore architectures. A parallel proximal algorithm is also available to 
solve multicomponent signal processing problems |27) . This framework captures in particular 
problem formulations found in [7] [51 [501 IHHl I133| . Let us add that an alternative splitting 
framework applicable to (|53p was recently proposed in |67] . 



8 Conclusion 

We have presented a panel of convex optimization algorithms sharing two main features. First, 
they employ proximity operators, a powerful generalization of the notion of a projection oper- 
ator. Second, they operate by splitting the objective to be minimized into simpler functions 
that are dealt with individually. These methods are applicable to a wide class of signal and 
image processing problems ranging from restoration and reconstruction to synthesis and de- 
sign. One of the main advantages of these algorithms is that they can be used to minimize 
nondifferentiable objectives, such as those commonly encountered in sparse approximation and 
compressed sensing, or in hard-constrained problems. Finally, let us note that the variational 
problems described in (|39|) . p6)) . and (|59)) . consist of computing a proximity operator. There- 
fore the associated algorithms can be used as a subroutine to compute approximately proxim- 
ity operators within a proximal splitting algorithm, provided the latter is error tolerant (see 
[48l l49l [5T] [66] I115| for convergence properties under approximate proximal computations) . An 
application of this principle can be found in [3S] . 
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Table 1: Properties of proximity operators [271 EZl [531 [Ml [IHS]: 99 e ro(M^); C C is 



nonempty, closed, and convex; j; G M 



Property 




proxj x 


i translation 


ip{x -z), ze 


z + prox<^(x - z) 


ii scaling 


ip{x/p), /9 G M \ {0} 


PWO^^/p2 (x/p) 


iii reflection 


f{-x) 


-prox^(-x) 


iv quadratic 
perturbation 


ip{x) + a X p/2 + uJ X + 7 

^, HI) A* ^ \ n ^. r- HI) 

wEM. ,a>(J, 7GM 


prox<^/(Q+i) {{x - u)/{a + 1)) 


V conjugation 




X — prox^x 


vi squared distance 
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-(x + Pc'x) 


vii Moreau envelope 
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ip{x)= inf w(y) + -||x-yf 

y&M.^ 2. 
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- (x + prox2^x) 


viii Moreau complement 
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l^\\-f-^{x) 
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ix decomposition 

in Jin orf MnrinrTTi?il 

basis {h)i<k<N 
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X semi-orthogonal 
linear transform 
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Table 2: Proximity operator of (p gTq 
|37l [531155]. 



a £R, K > 0, K> 0, K > 0, u > 0, ui< to, q > 1, T > 
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where p > and p^'^ ^ — xp'' ^ = q 
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such that p' 
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where W is the Lambert W-function 



^(x + uj_ + ^\x - + 4^ if X <l/u 
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(see Figure [T]) 
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